Revisiting Web Data Extraction Using In-Browser Structural Analysis and Visual Cues in Modern Web Designs
نویسندگان
چکیده
Recent trends in website design have an impact on methods used for web data extraction. Many existing methods rely on structural analysis of web pages and, with the introduction of CSS, table-based layouts are no longer used, while responsive design means that layout and presentation are dependent on browsing context which also makes the use of visual clues more complex. We present DeepDesign, a system that semi-automatically extracts data records from web pages based on a combination of structural and visual features. It runs in a generalpurpose browser, taking advantage of direct access to the complete CSS3 spectrum and the capability to trigger and execute JavaScript in the page. The user sees record matching in real-time and dynamically adapts the process if required. We present the details of the matching algorithms and provide an evaluation of them based on the top ten Alexa websites.
منابع مشابه
designing and implementing a 3D indoor navigation web application
During the recent years, the need arises for indoor navigation systems for guidance of a client in natural hazards and fire, due to the fact that human settlements have been complicating. This research paper aims to design and implement a visual indoor navigation web application. The designed system processes CityGML data model automatically and then, extracts semantic, topologic and geometric...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملA Comparative Study of Performance of Adaptive Web Sampling and General Inverse Adaptive Sampling in Estimating Olive Production in Iran
Nowadays, there is an increasing use of sampling methods in network and spatial populations. Although the most common link-tracing designs such as adaptive cluster sampling and snowball sampling have advantages over conventional sampling designs such as simple random sampling and cluster sampling, these designs still present many drawbacks. Adaptive web sampling is a new link-tracing design tha...
متن کاملSociological Impact of Using Digital (Web-based) Analyses on Performance Measurement and Optimization of Digital Marketing among Young Managers (Case study: Digital-based Companies in Tehran)
This research aims to study the effect of using digital (web-based) analyses in performance measurement and optimization of digital marketing in digital-based companies in Tehran. The data collection tool was a researcher-made questionnaire. A panel of experts and supervisor were asked to measure the validity of the questionnaire. For reliability analysis of this tool, Cronbach’s alpha test was...
متن کامل